1 research outputs found

    Bayesian Nonparametric Methods for Protein Structure Prediction

    Get PDF
    The protein structure prediction problem consists of determining a protein’s three-dimensional structure from the underlying sequence of amino acids. A standard approach for predicting such structures is to conduct a stochastic search of conformation space in an attempt to find a conformation that optimizes a scoring function. For one subclass of prediction protocols, called template-based modeling, a new protein is suspected to be structurally similar to other proteins with known structure. The solved related proteins may be used to guide the search of protein structure space. There are many potential applications for statistics in this area, ranging from the development of structure scores to improving search algorithms. This dissertation focuses on strategies for improving structure predictions by incorporating information about closely related “template” protein structures into searches of protein conformation space. This is accomplished by generating density estimates on conformation space via various simplifications of structure models. By concentrating a search for good structure conformations in areas that are inhabited by similar proteins, we improve the efficiency of our search and increase the chances of finding a low-energy structure. In the course of addressing this structural biology problem, we present a number of advances to the field of Bayesian nonparametric density estimation. We first develop a method for density estimation with bivariate angular data that has applications to characterizing protein backbone conformation space. We then extend this model to account for multiple angle pairs, thereby addressing the problem of modeling protein regions instead of single sequence positions. In the course of this analysis we incorporate an informative prior into our nonparametric density estimate and find that this significantly improves performance for protein loop prediction. The final piece of our structure prediction strategy is to connect side-chain locations to our torsion angle representation of the protein backbone. We accomplish this by using a Bayesian nonparametric model for dependence that can link together two or more multivariate marginals distributions. In addition to its application for our angular-linear data distribution, this dependence model can serve as an alternative to nonparametric copula methods
    corecore